-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: remove dependency on KFP lib + eval serving endpoint support + ci #76
Conversation
ffc54d0
to
42bcfb4
Compare
standalone/README.md
Outdated
--namespace my-namespace \ | ||
--eval-serving-endpoint http://serving.kubeflow.svc.cluster.local:8080/v1 \ | ||
--eval-serving-model-name my-model \ | ||
--eval-serving-model-api-key ***** \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
canthis be judge-serving
instead of eval-serving
? ilab uses the term judge_model
and --judge-model
(CLI doesn't yet support external endpoint or api_key though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--judge-serving-model-name
--judge-serving-endpoint
--judge-serving-model-api-key
We now use a generic python executor to call the eval and any DSL component functions. They are almost identical, only the signature is updated to not use KFP Classes. Signed-off-by: Sébastien Han <seb@redhat.com>
During the initial MT-Bench eval, we must connect to a serving endpoint. The script has been updated with new required flags: * `--eval-serving-endpoint`: Serving endpoint for evaluation. e.g: http://serving.kubeflow.svc.cluster.local:8080/v1 - **Required** * `--eval-serving-model-name`: The name of the model to use for evaluation. **Required** * `--eval-serving-model-api-key`: The API key for the model to evaluate. `EVAL_SERVING_MODEL_API_KEY` environment variable can be used as well. **Required** Signed-off-by: Sébastien Han <seb@redhat.com>
If someone pushes code changes and the pipeline.yaml has not been updated accordingly, the CI will fail and ask to update it and push again. Also, you can now generate a pipeline with `make pipeline`. Signed-off-by: Sébastien Han <seb@redhat.com>
The tarball located on S3 must contains both SDG data and the model to train. Signed-off-by: Sébastien Han <seb@redhat.com>
3612fe7
to
a645a1e
Compare
d5de14c
to
3e5d1fc
Compare
export TRITON_CACHE_DIR=/tmp | ||
export HF_HOME=/tmp | ||
export TRANSFORMERS_CACHE=/tmp | ||
torchrun --nnodes {nnodes} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't seem fully equivalent to running the run_main_ds.py
, e.g. for options validation. Maybe the logic from the run_training
in instructlab.training.main_ds
could be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm deferring this to @MichaelClifford who originally implemented this code.
36ba1f9
to
8a1f082
Compare
Sorry, I went really far with this one but I can confirm that: * sdg-data-fetch is working * data processing works * training phase 1 is working * training phase 2 is working Also: * remove backtick from the code since it breaks the shell that runs the python executor * only use a single PVC for everything: sdg data, model, trained model * --force-pull: to force pulling from the object store again if the data are already present Signed-off-by: Sébastien Han <seb@redhat.com>
a257f06
to
41990c7
Compare
The pipeline excepts to have all the functions at its disposal to run. So helper packages cannot be used. In this case, helpers only works because the current eval is using a custom image. Let's move all the logic inside the component. Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks @leseb! I've tested this successfully through SDG downloading and training. Going to merge this now. Any other changes we can make in a follow up PR 😃
195862b feat: remove dependency on KFP lib
7d8d314 feat: add serving endpoint details for eval
6852064 ci: add a check to ensure the pipeline is up to date
ca03343 fix: clarify the structure of the initial tarball
2b9d01e bulk commit
2eb8e1c fix: eval, do not use external deps
8a3cef9 misc: use the correct images
commit 195862b
Author: Sébastien Han seb@redhat.com
Date: Tue Oct 8 22:41:10 2024 +0200
commit 7d8d314
Author: Sébastien Han seb@redhat.com
Date: Wed Oct 9 09:54:26 2024 +0200
commit 6852064
Author: Sébastien Han seb@redhat.com
Date: Wed Oct 9 09:48:25 2024 +0200
commit ca03343
Author: Sébastien Han seb@redhat.com
Date: Wed Oct 9 14:40:27 2024 +0200
commit 2b9d01e
Author: Sébastien Han seb@redhat.com
Date: Wed Oct 9 23:14:44 2024 +0200
commit 2eb8e1c
Author: Sébastien Han seb@redhat.com
Date: Thu Oct 10 12:02:14 2024 +0200
commit 8a3cef9
Author: Sébastien Han seb@redhat.com
Date: Thu Oct 10 12:49:27 2024 +0200